Biopsy via fine-needle aspiration is a relatively non-invasive and quick technique for assessing the malignancy of breast lumps (source). Breast lump biopsies are sent to pathologists who then examine the tissue sample to assess whether the cells are cancerous or not. Machine learning applications could help improve and streamline this process by extracting tissue characteristics (mean area, symmetry, concavity, etc.) from microscope images and then feeding this information into the stacked machine learning model to predict the tissue’s malignancy (with 97.7% accuracy).
Early detection is key to recovery from breast cancer. Women who are diagnosed at the earliest stage of breast cancer have a much greater chance of surviving (> 90%) compared to women who are diagnosed at the most advanced stage (only 5%, source). Leveraging data and machine learning approaches could help to speed up breast cancer diagnosis and detection.
The scatterplot to the left illustrates the key differences between malignant (gray) and benign (red) breast tumors. Each point represents a breast tumor based on the characteristics of it’s cell nuclei. Points that are closer together are tumors that have more similar cell nuclei, while points farther away from each other are tumors with more dissimilar cell nuclei. Each line is a vector representing a particular characteristic of the cell nuclei. Values of that characteristic increase along the vector (starting from the origin, 0,0), so for instance malignant tumors have larger values for the nuclei’s mean area, standard error of the radius, and standard error of nuclei concavity compared to benign tumors. Similarly, benign tumors tend to have larger standard errors of the nuclei symmetry compared to malignant tumors.